Kendall’s Tau (Rank Correlation) — Measure + Hypothesis Test#
Kendall’s tau answers a concrete ordering question:
If I pick two observations at random, how often do x and y agree on which one is larger?
It’s a non-parametric measure of monotonic association (excellent for ordinal data), and it naturally supports a hypothesis test for association / independence.
Learning goals#
By the end you can:
explain concordant vs discordant pairs (the entire statistic is built from this)
compute \(\tau\) (tau-a and tau-b) from scratch with NumPy
run and interpret a permutation test for
H0: no associationinterpret \(\tau\) as a probability difference (not “percent correlation”)
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)
When to use Kendall’s tau#
Use Kendall’s tau when:
your variables are ordinal (ranks, ratings, Likert scales) or you mostly trust the ordering
you expect a monotonic relationship (increasing/decreasing, not necessarily linear)
you want something fairly robust to outliers compared to Pearson correlation
Common alternatives:
Pearson: measures linear association (sensitive to outliers; assumes more structure)
Spearman’s rho: correlation of ranks (also monotonic; different weighting than tau)
Kendall’s tau is often the most interpretable when you want to reason in terms of pairwise ordering agreement.
1) The core idea: concordant vs discordant pairs#
Take any pair of observations \((i, j)\).
Define the pairwise differences:
\(\Delta x = x_i - x_j\)
\(\Delta y = y_i - y_j\)
Look at the signs:
if \(\Delta x\) and \(\Delta y\) have the same sign, the pair is concordant
if they have opposite signs, the pair is discordant
if either difference is 0, you have a tie in \(x\), \(y\), or both
A convenient encoding is the pair contribution:
Summing those contributions over all pairs gives Kendall’s S statistic:
From \(S\) we get tau-a (no tie correction):
And tau-b (tie-corrected; usually preferred for ordinal/discrete data):
where:
\(n_1\) is the number of pairs tied in \(x\)
\(n_2\) is the number of pairs tied in \(y\)
Under independence, \(S\) (and therefore \(\tau\)) is centered around 0.
def _clean_xy(x, y):
"""Return 1D arrays with rows containing NaNs removed."""
x = np.asarray(x)
y = np.asarray(y)
if x.shape != y.shape:
raise ValueError(f"x and y must have the same shape, got {x.shape} and {y.shape}.")
x = np.ravel(x)
y = np.ravel(y)
# np.isnan works for numeric dtypes; for non-numeric inputs this will raise.
mask = ~(np.isnan(x) | np.isnan(y))
return x[mask], y[mask]
def kendall_pair_counts(x, y):
"""Compute concordant/discordant/tie counts for Kendall's tau.
Returns a dict with:
- n: number of observations
- n_pairs: number of pairs (n choose 2)
- C: #concordant
- D: #discordant
- T_x: #ties in x only
- T_y: #ties in y only
- T_xy: #ties in both x and y
- S: C - D
This is an O(n^2) reference implementation meant for learning.
"""
x, y = _clean_xy(x, y)
n = x.size
if n < 2:
return dict(n=int(n), n_pairs=0, C=0, D=0, T_x=0, T_y=0, T_xy=0, S=0)
i, j = np.triu_indices(n, k=1)
dx = x[i] - x[j]
dy = y[i] - y[j]
sx = np.sign(dx)
sy = np.sign(dy)
prod = sx * sy
C = int(np.sum(prod > 0))
D = int(np.sum(prod < 0))
T_x = int(np.sum((sx == 0) & (sy != 0)))
T_y = int(np.sum((sy == 0) & (sx != 0)))
T_xy = int(np.sum((sx == 0) & (sy == 0)))
S = C - D
return dict(
n=int(n),
n_pairs=int(i.size),
C=C,
D=D,
T_x=T_x,
T_y=T_y,
T_xy=T_xy,
S=int(S),
)
def kendall_tau_a(x, y):
"""Kendall's tau-a (no tie correction)."""
counts = kendall_pair_counts(x, y)
n_pairs = counts["n_pairs"]
if n_pairs == 0:
return np.nan, counts
tau = counts["S"] / n_pairs
return float(tau), counts
def kendall_tau_b(x, y):
"""Kendall's tau-b (tie-corrected)."""
counts = kendall_pair_counts(x, y)
C = counts["C"]
D = counts["D"]
T_x = counts["T_x"]
T_y = counts["T_y"]
denom = np.sqrt((C + D + T_x) * (C + D + T_y))
tau = counts["S"] / denom if denom != 0 else np.nan
counts = {**counts, "denom": float(denom)}
return (float(tau) if np.isfinite(tau) else np.nan), counts
2) A tiny example you can see#
We’ll use a small dataset so we can reason about pairs directly.
the scatter plot shows the data
the bar chart shows how many pairs are concordant vs discordant vs tied
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 2, 4, 3, 5]) # one inversion (3 and 4 swap)
tau_b, counts = kendall_tau_b(x, y)
print(f"tau-b = {tau_b:.3f}")
counts
tau-b = 0.800
{'n': 5,
'n_pairs': 10,
'C': 9,
'D': 1,
'T_x': 0,
'T_y': 0,
'T_xy': 0,
'S': 8,
'denom': 10.0}
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=x,
y=y,
mode="markers+text",
text=[str(i) for i in range(len(x))],
textposition="top center",
marker=dict(size=10),
)
)
fig.update_layout(
title="Tiny example (point labels are indices)",
xaxis_title="x",
yaxis_title="y",
)
fig.show()
labels = ["Concordant (C)", "Discordant (D)", "Tie in x (T_x)", "Tie in y (T_y)", "Tie in both (T_xy)"]
values = [counts["C"], counts["D"], counts["T_x"], counts["T_y"], counts["T_xy"]]
fig = px.bar(
x=labels,
y=values,
title="Pair types that build Kendall’s tau",
labels={"x": "pair type", "y": "count"},
)
fig.update_layout(xaxis_tickangle=-20)
fig.show()
Interpreting the sign and magnitude#
sign: \(\tau > 0\) means larger x tends to come with larger y (monotone increasing); \(\tau < 0\) means the opposite.
magnitude: in the no-ties (continuous) case, \(\tau\) has a clean probability interpretation:
So if \(\tau = 0.30\) (and there are no ties), concordance is about 30 percentage points more likely than discordance for a randomly chosen pair.
Important nuance:
Independence implies \(\tau = 0\), but \(\tau = 0\) does not necessarily imply independence. It means “no monotone tendency detected by this statistic”.
# Among comparable (non-tied) pairs, what fraction are concordant vs discordant?
comparable = counts["C"] + counts["D"]
p_conc = counts["C"] / comparable
p_disc = counts["D"] / comparable
print(f"Comparable pairs: {comparable} / {counts['n_pairs']} total")
print(f"P(concordant | comparable) = {p_conc:.3f}")
print(f"P(discordant | comparable) = {p_disc:.3f}")
Comparable pairs: 10 / 10 total
P(concordant | comparable) = 0.900
P(discordant | comparable) = 0.100
3) Ties (and why tau-b exists)#
With ordinal/discrete data, ties are common. Ties create a practical issue:
tau-a divides by the total number of pairs \(\binom{n}{2}\), even though many pairs might be “uninformative” because of ties
as a result, even a perfectly monotone relationship with many ties can have \(|\tau_a| < 1\)
Tau-b fixes this by rescaling based on how many pairs are actually comparable in \(x\) and in \(y\).
# An ordinal-ish example with ties
x_tie = np.array([1, 1, 2, 2, 3, 3])
y_tie = np.array([1, 1, 2, 3, 3, 3])
tau_a, counts_a = kendall_tau_a(x_tie, y_tie)
tau_b, counts_b = kendall_tau_b(x_tie, y_tie)
print(f"tau-a = {tau_a:.3f}")
print(f"tau-b = {tau_b:.3f}")
counts_b
tau-a = 0.667
tau-b = 0.870
{'n': 6,
'n_pairs': 15,
'C': 10,
'D': 0,
'T_x': 1,
'T_y': 2,
'T_xy': 2,
'S': 10,
'denom': 11.489125293076057}
# Visualize ties with a little jitter so points don't sit exactly on top of each other
jitter = 0.06
xj = x_tie + rng.normal(0, jitter, size=x_tie.size)
yj = y_tie + rng.normal(0, jitter, size=y_tie.size)
fig = px.scatter(
x=xj,
y=yj,
title="Ordinal data with ties (visualized with small jitter)",
labels={"x": "x (jittered)", "y": "y (jittered)"},
)
fig.add_annotation(
x=0.02,
y=0.98,
xref="paper",
yref="paper",
showarrow=False,
align="left",
text=f"tau-a = {tau_a:.3f}<br>tau-b = {tau_b:.3f}",
)
fig.show()
4) Visual intuition: the concordance matrix#
For a small dataset, you can visualize every pair’s contribution.
We build a matrix:
+1(red) means the pair is concordant-1(blue) means the pair is discordant0(white) means a tie in x or y (or the diagonal)
This is literally what \(S\) sums over (for \(i < j\)).
n_small = 12
x_small = np.arange(n_small)
# Mostly increasing, but with noise so we get some discordant pairs
y_small = x_small + rng.normal(0, 2.0, size=n_small)
# Build the full matrix for visualization (O(n^2) but tiny here)
dx = x_small[:, None] - x_small[None, :]
dy = y_small[:, None] - y_small[None, :]
M = np.sign(dx) * np.sign(dy)
np.fill_diagonal(M, 0)
fig = px.imshow(
M,
zmin=-1,
zmax=1,
color_continuous_scale="RdBu",
title="Concordance matrix M (red=concordant, blue=discordant)",
labels=dict(x="j", y="i", color="sign"),
)
fig.update_layout(coloraxis_colorbar=dict(tickvals=[-1, 0, 1]))
fig.show()
# Check that summing the upper triangle matches S
_, counts_small = kendall_tau_a(x_small, y_small)
S_from_matrix = int(np.sum(np.triu(M, k=1)))
print("S (from counts):", counts_small["S"])
print("S (from matrix):", S_from_matrix)
S (from counts): 48
S (from matrix): 48
5) Tau cares about order, not the scale#
Because tau is built from comparisons (x_i > x_j?), it is invariant to strictly monotone transformations.
Example: if you replace \(x\) with \(\exp(x)\) (strictly increasing), the ordering doesn’t change — and tau doesn’t change.
This is a big reason tau is popular for:
ordinal scales
heavy-tailed data
relationships that are monotonic but not linear
# Nonlinear but monotonic relationship
n = 80
x_nl = rng.normal(size=n)
y_nl = x_nl**3 + rng.normal(0, 1.5, size=n)
tau_raw, _ = kendall_tau_b(x_nl, y_nl)
tau_exp, _ = kendall_tau_b(np.exp(x_nl), y_nl)
pearson = np.corrcoef(x_nl, y_nl)[0, 1]
print(f"Kendall tau-b (x, y) = {tau_raw:.3f}")
print(f"Kendall tau-b (exp(x), y) = {tau_exp:.3f}")
print(f"Pearson corr (x, y) = {pearson:.3f}")
fig = px.scatter(
x=x_nl,
y=y_nl,
title="Monotonic but nonlinear relationship (y = x^3 + noise)",
labels={"x": "x", "y": "y"},
)
fig.add_annotation(
x=0.02,
y=0.98,
xref="paper",
yref="paper",
showarrow=False,
align="left",
text=f"Kendall tau-b = {tau_raw:.3f}<br>Pearson r = {pearson:.3f}",
)
fig.show()
Kendall tau-b (x, y) = 0.309
Kendall tau-b (exp(x), y) = 0.309
Pearson corr (x, y) = 0.589
6) Hypothesis testing: is the association more than chance?#
A common hypothesis test is:
H0: \(X\) and \(Y\) are independent (no association)
(under H0 the expected tau is 0)H1: there is an association (two-sided), or specifically increasing/decreasing (one-sided)
Permutation test (recommended for learning and for small samples)#
Under H0, the pairing between x and y is arbitrary.
So we:
compute the observed \(\tau\)
repeatedly permute y (break any real association)
recompute \(\tau\) for each permutation
see how extreme the observed \(\tau\) is relative to this null distribution
This uses only the assumption of exchangeability under H0 (a good match for independent observations).
def kendall_permutation_test(
x,
y,
*,
n_resamples=2000,
alternative="two-sided",
rng=None,
):
"""Permutation test for association using Kendall's tau-b.
Returns (tau_obs, p_value, tau_perm).
alternative:
- "two-sided": |tau_perm| >= |tau_obs|
- "greater": tau_perm >= tau_obs
- "less": tau_perm <= tau_obs
"""
if rng is None:
rng = np.random.default_rng()
x, y = _clean_xy(x, y)
tau_obs, _ = kendall_tau_b(x, y)
if not np.isfinite(tau_obs):
raise ValueError("Observed tau is not finite; check for constant inputs or all-ties.")
tau_perm = np.empty(n_resamples, dtype=float)
for b in range(n_resamples):
y_perm = rng.permutation(y)
tau_perm[b], _ = kendall_tau_b(x, y_perm)
alternative = alternative.lower()
if alternative == "two-sided":
extreme = np.abs(tau_perm) >= abs(tau_obs)
elif alternative == "greater":
extreme = tau_perm >= tau_obs
elif alternative == "less":
extreme = tau_perm <= tau_obs
else:
raise ValueError("alternative must be one of: 'two-sided', 'greater', 'less'.")
# +1 smoothing avoids p=0
p_value = (np.sum(extreme) + 1) / (n_resamples + 1)
return float(tau_obs), float(p_value), tau_perm
# Example: moderately monotone association
n = 60
x_ex = rng.normal(size=n)
y_ex = np.tanh(1.2 * x_ex) + rng.normal(0, 0.35, size=n)
tau_obs, p_value, tau_null = kendall_permutation_test(x_ex, y_ex, n_resamples=3000, rng=rng)
print(f"Observed tau-b = {tau_obs:.3f}")
print(f"Permutation p-value (two-sided) = {p_value:.4f}")
fig = px.histogram(
tau_null,
nbins=50,
title="Permutation null distribution of Kendall tau-b (H0: independence)",
labels={"value": "tau-b (permuted)"},
)
fig.add_vline(
x=tau_obs,
line_width=3,
line_color="crimson",
annotation_text=f"observed tau = {tau_obs:.3f}",
annotation_position="top",
)
fig.add_vline(
x=-tau_obs,
line_width=3,
line_color="crimson",
line_dash="dot",
annotation_text="-observed",
annotation_position="top",
)
fig.show()
Observed tau-b = 0.678
Permutation p-value (two-sided) = 0.0003
Interpreting the test#
A small p-value means the observed ordering agreement (tau) is unlikely under H0.
Always report tau itself as the effect size.
Two practical reminders:
With large samples, even tiny tau values can be “statistically significant”.
With many ties (discrete data), prefer tau-b and/or a permutation test.
7) (Optional) Large-sample normal approximation (no ties)#
For the continuous case (no ties), Kendall’s \(S\) has an asymptotic normal distribution under H0.
For sample size \(n\) (no ties):
So a z-score is:
This is fast, but:
it’s only accurate-ish for larger \(n\)
tie corrections make the variance formula more complicated
permutation tests are usually easier to trust when learning
import math
def _normal_cdf(z):
return 0.5 * (1.0 + math.erf(z / math.sqrt(2.0)))
def kendall_tau_a_asymptotic_test(x, y, *, alternative="two-sided"):
"""Asymptotic z-test using S variance formula (no ties)."""
_, counts = kendall_tau_a(x, y)
n = counts["n"]
S = counts["S"]
if n < 2:
return np.nan, np.nan
var_s = n * (n - 1) * (2 * n + 5) / 18
z = S / math.sqrt(var_s)
alternative = alternative.lower()
if alternative == "two-sided":
p = 2 * (1 - _normal_cdf(abs(z)))
elif alternative == "greater":
p = 1 - _normal_cdf(z)
elif alternative == "less":
p = _normal_cdf(z)
else:
raise ValueError("alternative must be one of: 'two-sided', 'greater', 'less'.")
return float(z), float(p)
# Compare (roughly) to permutation on the same example when there are effectively no ties
z, p_asym = kendall_tau_a_asymptotic_test(x_ex, y_ex)
print(f"Asymptotic z (tau-a) = {z:.3f}")
print(f"Asymptotic p-value = {p_asym:.4f}")
Asymptotic z (tau-a) = 7.654
Asymptotic p-value = 0.0000
8) Bootstrap confidence interval (effect size uncertainty)#
A p-value answers “is it plausible tau is 0?”, but you often also want:
an uncertainty interval for tau itself
A simple approach is a bootstrap:
resample the dataset with replacement
recompute tau for each bootstrap sample
take percentiles for a CI
(Like the permutation test, this is straightforward to implement and visualize.)
def bootstrap_tau_b(x, y, *, n_boot=2000, ci=0.95, rng=None):
if rng is None:
rng = np.random.default_rng()
x, y = _clean_xy(x, y)
n = x.size
tau_samples = np.empty(n_boot, dtype=float)
for b in range(n_boot):
idx = rng.integers(0, n, size=n)
tau_samples[b], _ = kendall_tau_b(x[idx], y[idx])
alpha = 1 - ci
lo = np.quantile(tau_samples, alpha / 2)
hi = np.quantile(tau_samples, 1 - alpha / 2)
return tau_samples, float(lo), float(hi)
tau_boot, lo, hi = bootstrap_tau_b(x_ex, y_ex, n_boot=3000, rng=rng)
print(f"Bootstrap 95% CI for tau-b: [{lo:.3f}, {hi:.3f}]")
fig = px.histogram(
tau_boot,
nbins=60,
title="Bootstrap distribution of Kendall tau-b",
labels={"value": "tau-b (bootstrap)"},
)
fig.add_vline(x=lo, line_color="black", line_dash="dot", annotation_text="CI low")
fig.add_vline(x=hi, line_color="black", line_dash="dot", annotation_text="CI high")
fig.add_vline(x=tau_obs, line_color="crimson", annotation_text="observed")
fig.show()
Bootstrap 95% CI for tau-b: [0.577, 0.765]
9) Diagnostics and pitfalls#
Independence of observations matters. If you have time series or repeated measures, tau’s usual p-values can be misleading.
Ties are common in ordinal data → prefer tau-b.
Effect size vs significance: don’t stop at “p < 0.05”. Report tau (and ideally a CI).
Complexity: this reference implementation is \(O(n^2)\). For large datasets, use an optimized library implementation.
Exercises#
Create a dataset where Pearson correlation is near 0 but Kendall tau is clearly non-zero (hint: monotone but nonlinear).
Modify
kendall_permutation_testto support a one-sided alternative and verify it behaves as expected.Stress-test the \(O(n^2)\) implementation with increasing
nand plot runtime.
References#
Kendall, M. (1938). A New Measure of Rank Correlation.
SciPy:
scipy.stats.kendalltau(for a production-ready implementation and additional details).